discourse analysis
Practical Machine Learning for Aphasic Discourse Analysis
Pittman, Jason M., Phillips, Anton Jr., Medina-Santos, Yesenia, Stark, Brielle C.
Analyzing spoken discourse is a valid means of quantifying language ability in persons with aphasia. There are many ways to quantify discourse, one common way being to evaluate the informativeness of the discourse. That is, given the total number of words produced, how many of those are context-relevant and accurate. This type of analysis is called Correct Information Unit (CIU) analysis and is one of the most prevalent discourse analyses used by speech-language pathologists (SLPs). Despite this, CIU analysis in the clinic remains limited due to the manual labor needed by SLPs to code and analyze collected speech. Recent advances in machine learning (ML) seek to augment such labor by automating modeling of propositional, macrostructural, pragmatic, and multimodal dimensions of discourse. To that end, this study evaluated five ML models for reliable identification of Correct Information Units (CIUs, Nicholas & Brookshire, 1993), during a picture description task. The five supervised ML models were trained using randomly selected human-coded transcripts and accompanying words and CIUs from persons with aphasia. The baseline model training produced a high accuracy across transcripts for word vs non-word, with all models achieving near perfect performance (0.995) with high AUC range (0.914 min, 0.995 max). In contrast, CIU vs non-CIU showed a greater variability, with the k-nearest neighbor (k-NN) model the highest accuracy (0.824) and second highest AUC (0.787). These findings indicate that while the supervised ML models can distinguish word from not word, identifying CIUs is challenging.
Fair Play in the Newsroom: Actor-Based Filtering Gender Discrimination in Text Corpora
Urchs, Stefanie, Thurner, Veronika, Aßenmacher, Matthias, Heumann, Christian, Thiemichen, Stephanie
Language corpora are the foundation of most natural language processing research, yet they often reproduce structural inequalities. One such inequality is gender discrimination in how actors are represented, which can distort analyses and perpetuate discriminatory outcomes. This paper introduces a user-centric, actor-level pipeline for detecting and mitigating gender discrimination in large-scale text corpora. By combining discourse-aware analysis with metrics for sentiment, syntactic agency, and quotation styles, our method enables both fine-grained auditing and exclusion-based balancing. Applied to the taz2024full corpus of German newspaper articles (1980-2024), the pipeline yields a more gender-balanced dataset while preserving core dynamics of the source material. Our findings show that structural asymmetries can be reduced through systematic filtering, though subtler biases in sentiment and framing remain. We release the tools and reports to support further research in discourse-based fairness auditing and equitable corpus construction.
Analyzing Biases in Political Dialogue: Tagging U.S. Presidential Debates with an Extended DAMSL Framework
Prahallad, Lavanya, Mamidi, Radhika
We present a critical discourse analysis of the 2024 U.S. presidential debates, examining Donald Trump's rhetorical strategies in his interactions with Joe Biden and Kamala Harris. We introduce a novel annotation framework, BEADS (Bias Enriched Annotation for Dialogue Structure), which systematically extends the DAMSL framework to capture bias driven and adversarial discourse features in political communication. BEADS includes a domain and language agnostic set of tags that model ideological framing, emotional appeals, and confrontational tactics. Our methodology compares detailed human annotation with zero shot ChatGPT assisted tagging on verified transcripts from the Trump and Biden (19,219 words) and Trump and Harris (18,123 words) debates. Our analysis shows that Trump consistently dominated in key categories: Challenge and Adversarial Exchanges, Selective Emphasis, Appeal to Fear, Political Bias, and Perceived Dismissiveness. These findings underscore his use of emotionally charged and adversarial rhetoric to control the narrative and influence audience perception. In this work, we establish BEADS as a scalable and reproducible framework for critical discourse analysis across languages, domains, and political contexts.
Mind2: Mind-to-Mind Emotional Support System with Bidirectional Cognitive Discourse Analysis
Hong, Shi Yin, Oyshi, Uttamasha, Mai, Quan, Nkhata, Gibson, Gauch, Susan
Emotional support (ES) systems alleviate users' mental distress by generating strategic supportive dialogues based on diverse user situations. However, ES systems are limited in their ability to generate effective ES dialogues that include timely context and interpretability, hindering them from earning public trust. Driven by cognitive models, we propose Mind-to-Mind (Mind2), an ES framework that approaches interpretable ES context modeling for the ES dialogue generation task from a discourse analysis perspective. Specifically, we perform cognitive discourse analysis on ES dialogues according to our dynamic discourse context propagation window, which accommodates evolving context as the conversation between the ES system and user progresses. To enhance interpretability, Mind2 prioritizes details that reflect each speaker's belief about the other speaker with bidirectionality, integrating Theory-of-Mind, physiological expected utility, and cognitive rationality to extract cognitive knowledge from ES conversations. Experimental results support that Mind2 achieves competitive performance versus state-of-the-art ES systems while trained with only 10\% of the available training data.
Discourse-Driven Evaluation: Unveiling Factual Inconsistency in Long Document Summarization
Detecting factual inconsistency for long document summarization remains challenging, given the complex structure of the source article and long summary length. In this work, we study factual inconsistency errors and connect them with a line of discourse analysis. We find that errors are more common in complex sentences and are associated with several discourse features. We propose a framework that decomposes long texts into discourse-inspired chunks and utilizes discourse information to better aggregate sentence-level scores predicted by natural language inference models. Our approach shows improved performance on top of different model baselines over several evaluation benchmarks, covering rich domains of texts, focusing on long document summarization. This underscores the significance of incorporating discourse features in developing models for scoring summaries for long document factual inconsistency.
Automatic deductive coding in discourse analysis: an application of large language models in learning analytics
Zhang, Lishan, Wu, Han, Huang, Xiaoshan, Duan, Tengfei, Du, Hanxiang
Deductive coding is a common discourse analysis method widely used by learning science and learning analytics researchers for understanding teaching and learning interactions. It often requires researchers to manually label all discourses to be analyzed according to a theoretically guided coding scheme, which is time-consuming and labor-intensive. The emergence of large language models such as GPT has opened a new avenue for automatic deductive coding to overcome the limitations of traditional deductive coding. To evaluate the usefulness of large language models in automatic deductive coding, we employed three different classification methods driven by different artificial intelligence technologies, including the traditional text classification method with text feature engineering, BERT-like pretrained language model and GPT-like pretrained large language model (LLM). We applied these methods to two different datasets and explored the potential of GPT and prompt engineering in automatic deductive coding. By analyzing and comparing the accuracy and Kappa values of these three classification methods, we found that GPT with prompt engineering outperformed the other two methods on both datasets with limited number of training samples. By providing detailed prompt structures, the reported work demonstrated how large language models can be used in the implementation of automatic deductive coding.
How People Perceive The Dynamic Zero-COVID Policy: A Retrospective Analysis From The Perspective of Appraisal Theory
Yang, Na, Zhou, Kyrie Zhixuan, Li, Yunzhe
The Dynamic Zero-COVID Policy in China spanned three years and diverse emotional responses have been observed at different times. In this paper, we retrospectively analyzed public sentiments and perceptions of the policy, especially regarding how they evolved over time, and how they related to people's lived experiences. Through sentiment analysis of 2,358 collected Weibo posts, we identified four representative points, i.e., policy initialization, sharp sentiment change, lowest sentiment score, and policy termination, for an in-depth discourse analysis through the lens of appraisal theory. In the end, we reflected on the evolving public sentiments toward the Dynamic Zero-COVID Policy and proposed implications for effective epidemic prevention and control measures for future crises.
Multi-class Categorization of Reasons behind Mental Disturbance in Long Texts
As per estimation, 8 million people could not get Consider a post in a social media platform - Reddit specialist help as they were not considered sick enough to posted in subreddit r/depression. The post is personally qualify. This situation underscores the need for automation written by a user which exhibit higher levels of emotions of mental health detection from social media data where and stances associated with mental and social well-being, people express themselves and their thoughts, beliefs/ emotions respectively. The post written by user is given as: with ease. This self-reported social media data is valuable but laborious for manual interpretations; thus, although = "I do not want to read literature but my complex, an automated system would significantly enhance parents forced me to do so. Not happy with my the ability to understand a social media user's state of mental grades" health. Amid COVID-19, the social NLP research community A major concern of user is about education stating the witness increase in the use of social media to express issue of forced subjects by parents that clearly indicate thoughts/ feelings and share life experiences Gianfredi, child's lack of interest affecting state of mind.
NLP for Climate Policy: Creating a Knowledge Platform for Holistic and Effective Climate Action
Swarnakar, Pradip, Modi, Ashutosh
Climate change is a burning issue of our time, with the Sustainable Development Goal (SDG) 13 of the United Nations demanding global climate action. Realizing the urgency, in 2015 in Paris, world leaders signed an agreement committing to taking voluntary action to reduce carbon emissions. However, the scale, magnitude, and climate action processes vary globally, especially between developed and developing countries. Therefore, from parliament to social media, the debates and discussions on climate change gather data from wide-ranging sources essential to the policy design and implementation. The downside is that we do not currently have the mechanisms to pool the worldwide dispersed knowledge emerging from the structured and unstructured data sources. The paper thematically discusses how NLP techniques could be employed in climate policy research and contribute to society's good at large. In particular, we exemplify symbiosis of NLP and Climate Policy Research via four methodologies. The first one deals with the major topics related to climate policy using automated content analysis. We investigate the opinions (sentiments) of major actors' narratives towards climate policy in the second methodology. The third technique explores the climate actors' beliefs towards pro or anti-climate orientation. Finally, we discuss developing a Climate Knowledge Graph. The present theme paper further argues that creating a knowledge platform would help in the formulation of a holistic climate policy and effective climate action. Such a knowledge platform would integrate the policy actors' varied opinions from different social sectors like government, business, civil society, and the scientific community. The research outcome will add value to effective climate action because policymakers can make informed decisions by looking at the diverse public opinion on a comprehensive platform.